An Inverted Index for Storing and Retrieving Grammatical Dependencies

نویسندگان

  • Michaela Atterer
  • Hinrich Schütze
چکیده

Web count statistics gathered from search engines have been widely used as a resource in a variety of NLP tasks. For some tasks, however, the information they exploit is not fine-grained enough. We propose an inverted index over grammatical relations as a fast and reliable resource to access more general and also more detailed frequency information. To build the index, we use a dependency parser to parse a large corpus. We extract binary dependency relations, such as he-subj-say (he is the subject of say) as index terms and construct the index using publicly available open-source indexing software. The unit we index over is the sentence. The index can be used to extract grammatical relations and frequency counts for these relations. The framework also provides the possibility to search for partial dependencies (say, the frequency of he occurring in subject position), words, strings and a combination of these. One possible application is the disambiguation of syntactic structures.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Compressing Inverted Index Using Optimal FastPFOR

Indexing plays an important role for storing and retrieving the data in Information Retrieval System (IRS). Inverted Index is the most frequently used indexing structure in IRS. In order to reduce the size of the index and retrieve the data efficiently, compression schemes are used, because the retrieval of compressed data is faster than uncompressed data. High speed compression schemes can imp...

متن کامل

Text retrieval and the relational model

In this article, Macleod examines the suitability of the relational model in the context of storing and retrieving documents. The relational model is compared with other traditional models and their strengths and weaknesses are compared. Through out the article, Macleod largely compares the relational model against the text model. He states that ‘the terminology text model is used to refer to i...

متن کامل

Inverted indexes: Types and techniques

There has been a s ubstantial amount of research on high performance inverted index because most web and search engines use an inverted index to execute queries. Documents are normally stored as lists of words, but inverted indexes invert this by storing for each word the list of documents that the word appears in, hence the name “inverted index”. This paper presents the crucial research findin...

متن کامل

Searching Large Lexicons for Partially Specified Terms using Compressed Inverted Files

There are many advantages to be gained by storing the lexicon of a full text database in main memory. In this paper we describe how to use a compressed inverted file index to search such a lexicon for entries that match a pattern or partially specified term. This method provides an effective compromise between speed and space, running orders of magnitude faster than brute force search, but requ...

متن کامل

Aggregation-Aware Top-k Computation for Full-Text Search

A typical scenario in information retrieval and web search is to index a given type of items (e.g., web pages, images) and provide search functionality for them. In such a scenario, the basic units of indexing and retrieval are the same. Extensive study has been done for efficient top-k computation in such settings. This paper studies top-k processing for many emerging scenarios: efficiently re...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008